•  wi  w  u\  | 


RESEARCH !  /ision 

NAVAL  PC  sCHOi 

MONTEREY,  C/ .  9394 


NAVAL  POSTGRADUATE  SCHOOL 

'  Monterey,  California 


JACKKNIFING  THE  KAPLAN-MEIER  SURVIVAL 
ESTIMATOR  FOR  CENSORED  DATA: 
SIMULATION  RESULTS  AND  ASYMPTOTIC  ANALYSIS 

by 

D.  P.  Gaver 

R.  G.  Miller 

January  1982 


Approved  for  public  release;  distribution  unlimited, 


Prepared  for: 
D  208.14/2: NPS-55-82-004 


of  Naval  Research 
FEDDOCS  gtom,  Virginia  22217 


NAVAL  POSTGRADUATE  SCHOOL 
MONTEREY,  CALIFORNIA 

Rear  Admiral  J.  J.  Ekelund  David  A.  Schrady 

Superintendent  Acting  Provost 

Work  on  this  report  was  partially  sponsored  by  an  Office  of  Naval  Research 
contract  at  the  Naval  Postgraduate  School. 

Reproduction  of  all  or  part  of  this  report  is  authorized. 


Unclassified 


SECURITY   CLASSIFICATION   OF    THIS  PAGE   (Whan  Data  Enlarad) 


REPORT  DOCUMENTATION  PAGE 


1.     REPORT    NUMBER 

NPS55-82-004 


2.   GOVT    ACCESSION   NO 


4.     TITLE  (and  Subtitle) 

JACKKNIFING  THE  KAPLAN-MEIER  SURVIVAL  ESTIMATOR 
FOR  CENSORED  DATA:   SIMULATION  RESULTS  AND 
ASYMPTOTIC  ANALYSIS 


7.     AUTHORf«J 

D.    P.    Gaver   and    R.    G.    Miller 


READ  INSTRUCTIONS 
BEFORE  COMPLETING  KORM 


3.  .RECIPIENT'S  CATALOG   NUMBLH 


5.     TYPE   OF    REPORT   ft    PERIOD   COVERED 

Technical 


6.  PERFORMING  ORG.  REPORT  NUMBER 


8.  CONTRACT  OR  GRANT  NUMBERfiJ 


9.  PERFORMING  ORGANIZATION  NAME  AND  ADDRESS 

Naval  Postgraduate  School 
Monterey,  CA  93940 


t0.     PROGRAM   ELEMENT.  PROJECT,    TASK 
AREA  6    WORK   UNIT   NUMBERS 

61153N;  RR14-05-OE 
N0001482WR20017 


II.     CONTROLLING  OFFICE   NAME   AND   ADDRESS 

Naval  Postgraduate  School 
Monterey,  CA  93940 


12.  REPORT  DATE 

January  1982 


13.     NUMBER  OF  PAGES 
28 


14.     MONITORING   AGENCY  NAME  ft    ADDRESSf//  dltterent  from  Controlling  Otllce) 

Chief  of  Naval  Research 
Arlington,  Virginia  22217 


15.     SECURITY   CLASS,   (ot  this  report) 

Unclassified 


15a.     DECL  ASSIFI  CATION/'  DOWN  GRADING 
SCHEDULE 


16.     DISTRIBUTION   ST  ATEMEN  T  (ol  this  Report) 

Approved  for  public  release;  distribution  unlimited 


17.     DISTRIBUTION  STATEMENT  (ot  the  abstract  entered  In  Block  20,  II  dlllerent  Irom  Report) 


18.     SUPPLEMENTARY  NOTES 


19.     KEY  WORDS  (Continue  on  reverse  aide  If  necessary  and  Identity  by  block  number) 

survival  probability 
reliability 

non-parametric  estimates 
jackknife  method 


20.     ABSTRACT  (Continue  on  reverse  aide  It  necessary  and  Identity  by  block  number) 

The  Kaplan-Meier  estimate  is  a  non-parametric  maximum  likelihood  estimate  for 
the  probability  of  equipment  of  human  survival.   This  report  describes  a 
jackknife  confidence  limit  procedure  for  probability  of  survival,  based  on 
K.-M. ,  and  describes  confidence  limit  properties  by  simulation  and  by  asymptotic 
analysis. 


dd  ,; 


FORM 
AN  73 


1473  EDITION  OF    1  NOV  65  IS  OBSOLETE 

S/N   0102-014- 6601   | 


Unclassified 


SECURITY  CLASSIFICATION  OF  THIS  PAGE  (When  Data  Kntarad) 


JACKKNIFING  THE  KAPLAN-MEIER  SURVIVAL  ESTIMATOR 
FOR  CENSORED  DATA:   SIMULATION  RESULTS  AND  ASYMPTOTIC  ANALYSIS 

Donald  P.  Gaver 
Rupert  G.  Miller,  Jr. 

1 .   Introduction 

Censored  data  problems  arise  frequently  in  medical,  and 
also  in  engineering  system  reliability,  applications .   For  example, 
in  medical  survivorship  studies  some  subjects  may  be  lost  to 
follow-up,  or  available  data  may  be  analyzed  before  all  subjects 
have  expired.   In  the  equipment  reliability  context  observed  units 
may  still  be  in  operation,  perhaps  after  several  previous  failures, 
at  the  time  of  the  analysis.   Considerable  attention  has  been 
recently  devoted  to  developing  informative  statistical  methods 
for  handling  data  of  this  type  (see  Kalbfleisch  and  Prentice  (19  80)) 

It  is  straightforward,  though  sometimes  computationally 
tedious,  to  deal  with  censoring  in  a  parametric  manner,  i.e.  by 
assuming  a  specific  form  for  the  lifetime  distribution  (exponen- 
tial, Weibull,  lognormal,  or  whatever)  and  then  estimating  param- 
eters, perhaps  by  maximum  likelihood.   The  approach  adopted  here 
is,  instead,  to  begin  with  the  Kaplan-Meier  (1958)  product-limit 
estimator  of  survival  probability.   This  estimator  is  the  non- 
parametric  maximum  likelihood  estimator  of  a  distribution  function 
from  a  sample  of  singly-censored  data.   Then,  since  the  jackknife 
technique  has  been  shown  to  be  widely  useful  for  obtaining  robust 
intervals,  cf.  Miller  (1974),  it  is  applied  to  the  Kaplan-Meier 
estimate  in  order  to  obtain  approximate  confidence  intervals  for 


the  survival  probability.   It  is  reasonable  to  argue  that  if  the 
jackknife  is  to  be  valid  under  complex  censoring  it  must  perform 
correctly  in  this  simplest  of  all  situations,  and  if  it  does  work 
here  then  it  is  likely  to  also  work  in  more  complex  settings. 
Therefore,  in  a  sense  we  are  reporting  on  the  results  of  a  pilot 
study  of  an  attractive  procedure. 

In  this  paper  the  effect  of  jackknifing  the  Kaplan-Meier 
estimate  will  be  examined  both  by  Monte  Carlo  simulation  (sampling 
experiments)  and  by  asymptotic  analysis.   In  Section  4,  we  report 
on  the  results  of  some  extensive  Monte  Carlo  investigations,  com- 
paring confidence  limits  for  survival  probability  obtained  via 
jackknife  with  those  from  other  techniques.   It  will  be  seen  that 
the  jackknife  seems  to  perform  well  for  moderate  sample  sizes,  even 
under  some  rather  unusual  conditions.   In  Section  5,  asymptotic 
results  are  reported  that  provide  theoretical  underpinnings  for 
the  jackknife  procedure,  at  least  for  large  sample  sizes.   Specifi- 
cally, it  is  shown  that  the  jackknifed  estimate  is  approximately 
normal  with  the  asymptotically  correct  variance,  and  hence  produces 
correct  confidence  limits  for  the  Kaplan-Meier  estimate.   Taken  by 
itself,  this  result  may  not  be  terribly  important,  because  an 
expression  for  the  variance  of  the  estimator  is  known,  and  it  can 
be  estimated  by  substituting  estimates  of  any  unknown  functions 
into  the  expression.   However,  for  doubly  censored  data  (cf.  Turn- 
bull  (1974)),  and  for  data  with  censoring  and  truncation,  the  situ- 
ation is  more  complex  (cf.  Turnbull  (1978)).   The  fact  that  the  jack- 
knife  works  in  the  singly  censored  case  makes  it  more  likely  that 
it  works  for  these  more  complex  censoring  patterns  and  for  others 
as  well. 


It  should  be  noted  that  the  bootstrap  procedure,  a 
re-sampling  approach  investigated  by  Efron  (1979)  and  (1981)  is 
also  applicable  to  complex  censoring  situations,  apparently 
giving  results  in  good  agreement  with  Greenwood's  formula  for 
a  particular  case  investigated. 


2.   Formulation  of  the  Problem;  the  Kaplan-Meier  Estimate 

Suppose   x, ,x~,...,x   are   n   observed  survival  times, 
e.g.  of  medical  patients  or  of  equipments  subject  to  failure. 
Some  of  these  observations  are  of  complete  lifetimes  (failure 
times)  but  others  are  not,  having  been  censored  by  the  time  of 
observation.   For  short  we  refer  to  complete  observations  as 
deaths,  and  censored  observations  as  losses.   Censoring  simply 
means  that  a  "complete  time"  is  not  observed,  although  a  "partial 
time,"  up  to  the  censoring,  is.   Censoring  complicates  the  prob- 
lem of  estimating  the  theoretical  survival  probability  to  time 
x,   denoted  by   F° (x)  =  1  -  F° (x)  . 

Kaplan  and  Meier  (195  8)  furnish  a  maximum  likelihood 
estimate  of   F  (x)   from  among  the  class  of  admissable  distribu- 
tions.  This  product-limit  estimate  may  be  written  in  several 
equivalent  ways,  assuming  no  ties  among  the  observations: 


F°(x)  = 
n 


n 

X.  <x 

1 


n-r .  ^  6  . 
i    i 


n-r.+l 

i 


(2.1, a) 


n 

n 

i=l 


6.  (x) 

f  n-i  1  x 


n-i+1 


(2.1,b) 


k(x) 

n  p. 

i=l    X     i=l 


k(x)  rn.  6  . 
l-  l 


n 


(2.1,c) 


In  (2.1, a),   r.   is  the  rank  of   x.   among  the  ordered  observa- 
tions  x,,.  <x/ON  .  ..  <x,  w   and   6-   is  unity  if   x.   is  an 
(1)    (2)        (n)         ui         *       i 

observed  death,  being  zero  otherwise.   In  (2.1,b), 


(  1      if    x ,  .  ,  <  x 
'iVA'     \ 


and  is  a  time  of  death 
(uncensored) 
6,  (x)  =  <  (2.2) 


0      otherwise  . 

In  (2.1,c)  n.(=n-(i-l))  represents  the  number  of  items  exposed 
(to  either  death  or  loss)  at  the  ith  ordered  time,  and  k(x)  is 
the  total  number  of  deaths  by  time   x. 

A  numerical  example  helps  to  explain  the  estimate.   Suppose 
the  data  points  are 

1  <  2*  <  4  <  5*  <  7*  <  8  <  10 

where  the  starred  measurements  are  losses,  and  the  rest  deaths. 

Let  us  estimate  the  survival  probability  to  or  beyond  x  =  6.   Then, 

since   n  =  7,   and  k(6)  =  2 

F7(6)  =  (Zzi)1^)0^1^0  .  (6)(4} 

by  (2.1,b) 

by  (2.1,c) . 

Note  that  by  definition  (2.2)  the  estimate  jumps  down 

following  data  values  that  are  deaths,  does  not  jump  at 

losses,  and  remains  constant  between  down- jumps.   Technically, 

F  (x)   is  a  left-continuous  monotonically  non-increasing  step 

function;  this  makes   F  (x) ,   the  estimated  distribution  of  time 

n 

of  death,  left-continuous  as  well. 


3.   Interval  Estimates  for  the  Kaplan-Meier  Estimate 

For  a  given  set  of  data  the  K.-M.  estimate  provides  a 
point  estimate  of  the  survival  probability.   It  is,  of  course, 
desirable  to  assess  the  stability  of  such  an  estimate  under  rea- 
sonable assumptions  about  the  origin  of  the  data;  specifically  it 
is  useful  to  furnish  approximate  confidence  intervals  for  a  sur- 
vival distribution  F  (x) .   The  jackknife  procedure,  see  Miller 
(1974)  and  Mosteller  and  Tukey  (1977) ,  is  one  way  of  producing 
such  limits.   In  this  section  we  describe  the  computation  of  jack- 
knife  limits,  and  compare  the  results  to  confidence  limits  obtained 
by  alternative  procedures.   Comparisons  are  made  by  simulation. 

3.1.   The  Jackknife  Procedure 

The  jackknife  procedure  is  well-described  in  Mosteller 
and  Tukey  (19  77)  ,  where  it  is  pointed  out  that  a  preliminary 
transformation  to  approximately  symmetrize  the  sampling  distri- 
bution of  the  estimator  is  beneficial;  see  also  Cressie  (1981). 
For  this  study  we  have  chosen  to  utilize  the  classical  "inverse 
sine"  transformation  that  tends  to  stabilize  the  variance  of — and 
also  approximately  symmetrize — binomial  count  data.   This  trans- 
formation is  suggested  since  the  number  of  samples  surviving  a  fixed 
time  would  be  binomial  under  ideal  conditions  if  there  were  no 
censoring.   Initial  experiments  with  a  logistic  transformation 
proved  to  be  less  satisfactory,  as  was  a  simple  log  transformation; 
in  practice,  both  log  and  logistic  transformations  must  involve  a 
"start,"  see  Tukey  [1977],  which  influences  the  coverage.   A  natural 
choice  is   l/2n,   see  Cox  [1972] ,  but  systematic  confidence  interval 
undercoverage  results,  empirically  suggesting  a  larger  value.   Here 
is  our  procedure. 
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(a)  Select  a  value  of   x   at  which  to  estimate  survival 
probability. 

(b)  Compute   F  (x) ,   e.g.  by  (2.1). 


(c)  Compute   A  (x)  =    sin   /  F  (x)  . 

(d)  Compute   F   .  .  (x)  ,   the  K.-M.  estimate  leaving  out  the 

n  i  ,  3 

jth  observation,  whether  it  be  an  observed  (recorded) 
death,  or  a  loss.   The  formula  actually  used  was 


i-1         6i(x)   n  6i(x) 

f°   .(x)  =  n  (Szizl)      n  (-^M       (3.i) 

n-l,i       .  ,  v  n-1  J  . ,,  ^n-n+l; 

3=1  3=1+1     J 


(e)   Compute   A   ,  .  (x)  =    sin   /F   .  .  (x) 
r      n-1, j  /  n-1, 3 


(f)  Compute  the  jth  pseudovalue: 

v.  =  nAn(x)  -  (n-l)An_1  •  (x)  ,      j  =  l,2,...,n 

(g)  Find  the  mean  and  variance  of  the  pseudovalues : 

v  =  I     I      v   ,    and    s2  =  ^  I  (vrv)2  , 
3=1   J  3=1 

and    s   =  /s 
v   /  v 

(h)   Compute  (approximate)  two-sided   (l-a)«100%  confidence 
limits   as  follows: 

s 


v       .  -1  /iO 
—  <         sin   / F 


L  =  v  -  t,   /0(n-l)  —  5    sin   /F  (x) 
1~a/2  /n" 


s 
<  v+ t,   /0(n-l)  —  E  U  ,    (3.2) 


where   t,   /?(n-l)   is  the  %-point  of  Student's   t;  then 
invert  to  obtain  (approximate)  two-sided   (1-a)  •  100% 
confidence  limits  for  survival  beyond   x: 

2-0  2 

sin  (l)  5  F  (x)  5  sin  (u)  .  (3.3) 

Theoretical  justification  of  such  a  procedure  for  large   n   is 
given  in  a  final  section  of  this  paper.   The  quality  of  the 
product  is  illustrated  by  simulation  examples  to  appear  subse- 
quently. 
3.2.   Alternatives  to  the  Jackknife:   "Greenwood's  formula" 

The  classical  estimate  of  the  variance  of  the  estimate 
F  (x)   is  given  by  "Greenwood's  formula,"  see  Kaplan  and  Meier 
(1958) /  p.  477,  or  Thomas  and  Grunkemeier  (1975) ,  p.  867.   Again 
when  no  ties  are  present  this  may  be  expressed  as 


Var 


F°(x) 
n 


f_n    I2  kix)    6i 

F  (x)  >   ,    *    ,   . 

n    I  .  L n   n.  (n. -o  .  ) 

J  i=0   li   l 


(3.4) 


It  is  interesting  and  reassuring  that  this  approximate  formula 

delivers  exactly   f ) [— J—  as  an  estimated  variance  when  all 

2       K   n  }  yn}  n 

observed  events  are  deaths. 

It  follows  that  approximate  two-sided   (1-a)  •  100% 
confidence  limits  may  be  obtained  by  this  procedure: 

a)  Select  a  value  of   x   at  which  to  estimate  survival 
probability. 

b)  Compute   F  (x) ,   the  point  estimate  of  survival 


probability. 

In 

from  (3.4) 


2 
c)   Compute   s^  =  Var 


F°(x) 

n 


d)   Compute  approximate  two-sided   (1-a)  •  100%  confidence 
limits : 

LG  "  T°<x>  -Zl.a/2  J  .      UG=F°,x,+Zl_a/2!| 

where   z1_a/2   is  the   U-a/2)  •  100   percent  point  of 
the  unit  Normal.   Then 

LG  5  F  (x)  <  UG  (3.5) 

with  approximately  the  quoted  confidence. 

For  justification  of  the  above  procedure,  which  we  will 
call  the   Z^  procedure  following  Thomas  and  Grunkemeier  (19  75)  , 
when  n  is  large  refer  to  Breslow  and  Crowley  (19  74)  .   Simulation 
results  appear  subsequently. 
3.3.   An  Approximate  Likelihood-Ratio  Interval  Estimate 

Thomas  and  Grunkemeier  (19  75)  propose  use  of  a  likelihood- 
ratio  based  procedure  for  obtaining  approximate   (1-a) »100% 
confidence  limits.   In  outline,  the  procedure  approximately  maxi- 
mizes the  likelihood  of  a  survival  function  under  a  constraint; 
this  will  be  called  the   Z?   procedure.   For  a  similar  development 
see  Madansky  (1965).   Specifically,  one  maximizes  the  likelihood 
(5d)  of  Kaplan  and  Meier,  subject  to  the  constraint  that  survival 
to  time   x   equals   F  : 


k(_x) 

in  p. -In   FU} 

(3.6) 


(p"!x)  L   =  Jl^i  ^  U"Pi)  +  fci-Vtap^  +  XC  I     inP.-Zn   F0} 

n 
+  i=k(Ix)+i{6ito(1-pi)  +  (ni-5iUnPi>' 

giving  estimates 


n.  +  A  -  6. 


i  =  1,2,  . .  .  ,k (x)  ; 


n.  -  6. 
l    l 

n . 


(3.7) 


i  =  k (x) +1, . . . ,n 


and 


k(x) 

n   p  (A)  =    FU(x;A) 
i=l   1 


(3.8) 


from  the  constraint  condition.   Next  (numerically)  solve  the 
equation 


[F°(x)  -F°(x;A)]/F°(x;A)/?a7  =  ±z, 
n  1-a 


/2 


(3.9) 


for   AL   and   Ay  where   F°   is  the  product-limit  estimate  of 

survival  beyond  x,   F°(x;A)   is  given  by  (3.8),  and   z 

l-a/2 

is  the   (l-a/2)   100th  percent  point  of  the  unit  normal  distribu- 
tion.  Then,  according  to  Thomas  and  Grunkemeier  (see  footnote, 
p.  867)   V(A)   may  be  expressed  as  follows: 

k(x) 


^(A)  =  [(n+A)/n]  Y  6 . / (n . +  A) (n . +A-6 . ) 

.  u  -.    i    i      i     i 
i=l 


(3.10) 


=  [l-F°(x;A)]/[F°(x;A)  n(x)]    for   F°(x;A)=l 

here   n(x)   is  the  number  of  individuals  exposed  at   x.   Finally, 

(approximate)  upper  and  lower  confidence  limits  for  F,  .   are 

(x) 

obtained  by   substituting      Xr      and      A7T      into    (3.8): 

L  U 


k(x) 

p    =    n 
L      i=i 


n.+A   -6. 

1        L        1 


n.  +  Ar 
1  L 


k(x) 
and        p      =      n 
u        i=l 


fn.+X  -6.1 
1      u      1 


ni  +  Xu 


(3.11) 


10 


The  principle  difficulty  with  application  of  this 
method  is  the  numerical  solution  of  (3.9)  for  the  roots    A 

Ij 

and   A  .   A  Newton-Raphson  method  was  utilized  in  the  program 
developed  for  this  study.   It  was  only  feasible  to  make  exten- 
sive trials  of  the  procedure  for  sample  size   n  =  25. 
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4 .   Simulation  Results 

In  order  to  compare  the  performance  of  the  jackknife 
procedure  to  the  other  candidates  described  above,  namely   Z, 
and   Z„,   some  of  the  particular  cases  treated  by  Thomas  and 
Grunkemeier  (1975,  p.  168ff.)  were  simulated,  and  nominal  95% 
and  90%  confidence  limits  were  constructed.   We  summarize  the 
results  in  the  following  tables.   Note  that  assessments  are  made 
of  interval  performance  at  three  probability-of-survival  levels: 
0.75,  0.50,  0.25  for  each  combination  of  death  and  failure 
distributions . 

Examination  of  the  tabulations  of  confidence  limit  coverage 
and  also  the  average  and  standard  deviations  of  c.i.  widths  sug- 
gest that  the  jackknife  confidence  intervals  perform  in  a  generally 
conservative  manner  as  compared  to  the  "Greenwood's  formula" 
results   (Z, )   and  the  approximate  likelihood  ratio  method   (Z~) . 
That  is,   JK   tends  to  over-cover,  while   Z,   consistently  under- 
covers;   Z9   has  some  tendency  to  under-cover  with  severe  losses 
(Case  1)  and  for  small  probabilities  of  survival  but  generally 
performs  well.   Of  the  three  estimating  procedures,   Z9   is  by 
far  the  most  difficult  and  expensive  to  carry  out.   The  computer 
time  involved  in  computing   Z„   for   n  =  50   prohibited  tabulation 
of  those  results  for  this  study.   Note  that  the  tendency  of  the 
jackknife  to  over-cover  is  reduced  as  the  probability  of  survival 
decreases.   Actually  abusrdly  low  values  occur  for  survival  proba- 
bilities  0.50   and   0.25   in  Case  1;  they  are  a  consequence  of 
the  severe  censoring  assumed.   In  general,  the  results  obtained 
indicate  that  the  jackknife  procedure  is  a  worthy  competitor  of 
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"Greenwood's  formula"  under  present  circumstances,  and  that  it 
performs   only  a  little  less  effectively  than  does  the  approxi- 
mate likelihood-based  procedure   Z  .   The  presented  jackknife 
technique  tends  to  be  conservative. 

In  order  to  supplement  the  above  information,  a  number  of 
additional  simulations  were  made  to  investigate  the  effect  of 
departure   from  the  random  censoring  model.   Specifically,  the 
censoring  time,   Y. ,   was  allowed  to  depend  probabilistically 
upon  the  time  of  death,   X.,   for  a  sequence  of  experiments.   A 
selection  of  the  results  obtained  are  shown  next. 

In  the  above  situations,  in  which   X.   and   Y.   are  now 

1        1 

contrived  to  be  positively  dependent,  once  again  the  jackknife 
tends  to  result  in  over-coverage — i.e.  is  conservative,  and  some- 
times radically  so.   This  is  to  be  contrasted  with  Greenwood's 
formula  results,   Z.,   which  generally  under-cover.   Here  there 
is  some  indication  that  the  likelihood  ratio  procedure,   Z„, 
has  a  tendency  to  under-cover  when  the  survival  probability  is 
near   0.5.   Of  course,  all  results  are  for  rather  small  sample 
sizes,  and  refer  to  exponentially  distributed  deaths. 
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5 .   Summary  of  Theoretical  Developments 

In  this  section  a  probability  model  for  random  censoring 
is  introduced.   In  terms  of  this  model  it  will  be  shown  that  the 
jackknife  produces  asymptotically  correct  confidence  limits  for  the 
survival  probability  from  the  Kaplan-Meier  estimator.   A  priori 
one  could  not  be  certain  whether  to  systematically  delete  each 
observation  in  turn  when  applying  the  jackknife  or  whether  to 
delete  only  the  uncensored  ones.   Our  results  show  that  the  proper 
method  is  to  delete  each  observation,  censored  or  uncensored. 
5. 1.   The  Model 

Let   X, , X~,...,X   be  independent  random  variables  distrib- 
uted according  to  cdf   F  (x) ,   which  is  continuous  with   F  (0)  =0. 
In  medical  applications   X.   represents  the  survival  time  of  the 
ith  patient,  and  in  engineering  reliability  it  represents  the 
time  to  failure  of  the  ith  equipment  (or  the  ith  time  to  failure 
of  an  equipment,  when  appropriate) .The  problem  is  to  estimate   F  , 
but  unfortunately  the   X.   are  not  all  directly  observable. 

Let   Y-,Y0,...Y   be  independent  random  variables,  identi- 
1   2     n         r 

cally  distributed  according  to  cdf   G,   the  latter  being  continu- 
ous with   G(0)  =0.   The  observable  variables  are  then 

X.  =  min{X°,Y. }  , 
l        li 


(5.1) 


and  6.  =  I{X?  <  Y. }  , 

l       li 


where   I {A}   is  the  indicator  function  for  event  A.   The   Y. 

i 

variables  represent  censoring  times,  and  are  assumed  to  be  independ- 
ent of  the   X..   The  statistician    actually  observes  the  smaller 
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of  the  two  variables,  and  also  knows  whether  the  observation  is 
uncensored  (a  "death")  or  censored  (a  "loss"). 
5.2.   Cumulative  Hazard  Function 

The  Kaplan-Meier  estimator   F    is  closely  related  to  the 
sample  cumulative  hazard  function  (chf ) .   The  latter  is  defined 
as 

0        n   6,  (x) 


*n(x>  ■  I     H^I+T  '5-2» 

1  =  1 


where   6 • (x)   is  defined  in  (2.2) .   In  fact  Breslow  and  Crowley 
(19  74)  show  that 

-£n[l-F°(x)]  =  A°(x)  +0  (1/n)  ,  (5.3) 


and  it  may  be  shown  that 


A°(x) 

n        J 


a.s.  r    ,_0 


dF  (x) 

,  J,  ,   '  (5.4) 

1-F  (x) 


0 


the  integral  of  the  hazard  function   A  (u)  =  dF  (u)/[l-F  (u) ] ; 

both  (5.3)  and  (5.4)  justify  the  name  given  to  A  . 

It  is  convenient  to  show  that  the  jackknifed  estimator  of 

F  ,   denoted  by   F  (x) ,   is  asymptotically  normal  by  starting 

with  A  .   If  one  shows  that  A  (x)   is  asymptotically  normally 

distributed  then  it  follows  that   F  (x)   is  also  normal,  as  is 

n 

true  of  other  sufficiently  smooth  functions  (e.g.  arc  sine)  of 
F  (x) .   If,  in  addition,  it  is  shown  that  the  jackknife  variance 
is  consistent  then  the  jackknife  confidence  procedure  illustrated 
in  Section  4  is  justified  for  large  sample  sizes. 


23 


5.3.   Asymptotic  Normality 

Let  A  _, (x;i)   be  the  sample  chf  when  the  ith  ordered 

observation   X,.,   is  deleted  from  the  sample.   Then 
(i) 

i-l  6(.)(x)     n    i(j)(x) 


An-i(x;1)  =  ^  -^-r +  j J+1  -- 


j  + 1 


(5.5) 


The   corresponding   pseudo-value    is 


A     (x;i)    =   nA    (x)  -   (n-l)A       ,  (x;i) 
n  n  n-1 


n6,.,(x)         i-l    (j-l)5,..(x)  n        6.       (x) 

=  ill y    yJ +      y      _ui <5  6) 

n-i  +  1          .^    (n-j) (n-j+1)    +  .J±+1   n-j  +  1    *     °*b; 

The  jackknifed  estimator  is  the  average  of  the  pseudo- 
values.   From  (5.6), 


n 


.0,  >     1   r  .0  ,        .  , 
A  (x   =  -   )  A  (x;i) 
n      n  .*•-  n 
i=l 


n    6liJU)      i      n      i-l    (j-l)6(i)(x)       ±      n  n      6(i)(x) 

iil   n_i+1      n   i=2    j  =  i    (n-j)  (n-j+1)      n   ±i1    j=4+1   n-j+1 

0  i    n"1    (n-j)  (j-D<SM>  (x)         .      n      (j-l)S,  .>  (x) 

=   A     (x)   -—      V       ^ +    x      Y      (IN 

7Vx;       n    .£         (n-j)  (n-j+1)  +   n    .£,        n-j  +  1 

J      X  J  —  z 


(5.7) 


,    0     ,  >         ,  n-1  r.  .  , 

=  A     (x)  +     <5  ,    >  (x)     . 

n  n         (n)  v    ' 


Thus  the  jackknifed  estimator  and  the  original  estimator 
differ  by  an  asymptotically  negligible  term.     Now  it  has  been 
shown  that  A  (x)   is  asymptotically  normal  with  mean  A  (x)   and 
variance 


1 

n 


x 

dF° 


0 


(1-F) (1-F") 


0~  (5.8) 


(cf.  Breslow  and  Crowley  (1974) ,  Theorem  4) ,  and  so  it  follows 
that   A  (x)   has  the  same  asymptotic  distribution. 

In  order  to  study  the  Kaplan-Meier  estimator,  expand  the 
logarithm: 

-n        n       l   n    6 , . *  (x) 
Jin  FU(x)  =  -AU(x)  +  ±      I      UJ _  ...         (5.9) 

i=l  (n-i+1) 
Now  jackknife,  and  observe  that  the  result  of  jackknifing  the 
second  and  higher  order  terms  in  (5.9)  lead  to  expressions  which 
are   o  (1/Zn)  ,   and  so  the  jackknifed  version  of   Jin  F  (x)   has 

XT 

the  same  asymptotic  (normal)  distribution  as   -A  (x) .   Since 
exp[£n  F  (x) ]  =  F  (x) ,   and  the  exponential  function  is  smooth 
(possesses  a  power-series  expansion)  it  may  be  shown  that  the 
normality  of  the  jackknifed  version  of   Jin  F  (x)   implies  that 

of  the  jackknifed  F  (x) .   Furthermore,  the  asymptotic  normal 

n  0    2  rx        0 

-0  (1-F  (x)    '       AT? 

distribution  has  mean   F  (x)   and  variance   — 


n 


dF 


(1-F) (1-F° 


0 
5.4.   Consistency  of  the  Sample  Variance 

It  may  be  shown  that  the  sample  variance  of  pseudovalues 

converges  (a.s.)  to  the  correct  population  variance,  further 

justifying  the  use  of  the  jackknife  for  large  samples.   We  merely 

sketch  the  demonstration;  see  Miller  (1975)  for  details.   Begin 

again  by  considering  the  pseudovalues  obtained  by  jackknifing 

the  sample  cumulative  hazard  function.   From  (5.5)  the  jackknife 

variance  estimate  is  given  by 
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n  Var 


[An(x)lE  H  J/ V'5"1'  -  An(x))2  = 

!    "  (ni(i)(x)    1-1  (J-l)6(1)(x)      n   i(1)(x) 
"   n~1   i=l<  n_i  +  1    j£l  (n-j)  (n-j+1)    jmJ+1  n-j  +  1 


-   I 


n   6 ,..   (x) 


"  ^6(n)(x)}2      (5-10) 


n/n6    (x)    i-1  (j-l)6    (x)     i6(j)(x) 


n    n  /no,.,  ix;     x- 

=  _^L_   Y  \  ill -   y 

n-1  ,L,\   n  -  1  +  1     ■LL 


1=1 


j=l 


3 
(n-j) (n-j+1) 


1   O  ,..  MX  t  \ 


n=l 


n  ,5,., (x)    i-1    5,., (x) 


(x)  ,2 


n  |Q(i)w    x-x    o(i)W       u(n)W  i 

"        i=l*n~  i  +  1    j  =  l  (n"3)  (n-j+1)       n     { 

Now  square  and  study  the  individual  terms.   In  particular  the 
first  sum  of  squares  is 


n 


(n-1)  I 
i=l 


6(i)(xh2 


n-i+1 


a.  s . 


n 


1  n  j  .  ,  ln-i+lj 
i=l 


2  6(i) (x) 
n 


x 


d  F 


0 


(1-F) (1-F°) 


(5.11) 


agreeing  with  the  correct  value  (5.8)  multiplied  by   n.   Conse- 
quently the  remaining  terms  must  cancel  out  in  the  a.s.  limit 
in  order  that  the  jackknife  variance  function  properly.   The 
steps  are  omitted  here;  see  Miller  (1975)  for  details.   Finally, 
the  correctness  of  the  jackknife  variance  for  the  sample  chf  extends 
to  the  Kaplan-Meier  estimate  by  previous  arguments.   It  may  also 
be  shown  that  the  jackknife  works  properly  on  any  estimator  which 
is  a  smooth-enough  function  of   F  ;   in  particular  the  arc-sine, 
log,  or  logistic  transformations  may  all  be  jackknifed,  which 
justifies  the  approach  taken  in  Sections  3  and  4. 
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